49 research outputs found

    Clustering based on Random Graph Model embedding Vertex Features

    Full text link
    Large datasets with interactions between objects are common to numerous scientific fields (i.e. social science, internet, biology...). The interactions naturally define a graph and a common way to explore or summarize such dataset is graph clustering. Most techniques for clustering graph vertices just use the topology of connections ignoring informations in the vertices features. In this paper, we provide a clustering algorithm exploiting both types of data based on a statistical model with latent structure characterizing each vertex both by a vector of features as well as by its connectivity. We perform simulations to compare our algorithm with existing approaches, and also evaluate our method with real datasets based on hyper-textual documents. We find that our algorithm successfully exploits whatever information is found both in the connectivity pattern and in the features

    Approche bayésienne variationnelle pour l'agrégation de modèles en classification.

    No full text
    International audienceNous nous intéressons au cas d'un mélange entre deux populations dont l'une est connue et facilement identifiable. Plusieurs modèles ont été développés pour modéliser la distribution inconnue. Nous proposons une alternative qui consiste à prendre un mélange de plusieurs distributions gaussiennes de moyennes et variances inconnues. Chaque modèle apporte une information plus ou moins pertinente sur l'estimation des paramètres. Nous suggérons alors d'utiliser une approche BMA pour prendre en compte l'incertitude relative à chacun des modèles ainsi que de s'affranchir du choix du nombre de composants. En moyennant sur un ensemble de modèles, le BMA permet de calculer un estimateur agrégé à partir de l'information apportée par la collection de modèles, pondérée par le poids du modèle concerné. Dans la pratique, ce poids est estimé à partir du BIC mais la qualité de l'approximation pour obtenir ce critère est discutable. Ainsi, nous nous intéressons au cadre bayésien variationnel qui permet de définir naturellement une distribution a posteriori des paramètres et d'obtenir un poids pour chacun des modèles. Nous proposons dans ce travail la définition de poids de chaque modèle à partir de la minimisation de la divergence de Kullback-Leibler entre la distribution estimée des poids et la vraie. Une étude de simulation permet d'évaluer le comportement de notre estimateur agrégé

    Hidden Markov Models with mixtures as emission distributions

    No full text
    International audienceIn unsupervised classification, Hidden Markov Models (HMM) are used to account for a neighborhood structure between observations. The emission distributions are often supposed to belong to some parametric family. In this paper, a semiparametric model where the emission distributions are a mixture of parametric distributions is proposed to get a higher flexibility. We show that the standard EM algorithm can be adapted to infer the model parameters. For the initialization step, starting from a large number of components, a hierarchical method to combine them into the hidden states is proposed. Three likelihood-based criteria to select the components to be combined are discussed. To estimate the number of hidden states, BIC-like criteria are derived. A simulation study is carried out both to determine the best combination between the combining criteria and the model selection criteria and to evaluate the accuracy of classification. The proposed method is also illustrated using a biological dataset from the model plant Arabidopsis thaliana. A R package HMMmix is freely available on the CRAN

    Combined bacterial and fungal intestinal microbiota analyses: Impact of storage conditions and DNA extraction protocols

    No full text
    Background The human intestinal microbiota contains a vast community of microorganisms increasingly studied using high-throughput DNA sequencing. Standardized protocols for storage and DNA extraction from fecal samples have been established mostly for bacterial microbiota analysis. Here, we investigated the impact of storage and DNA extraction on bacterial and fungal community structures detected concomitantly. Methods Fecal samples from healthy adults were stored at -80'C as such or diluted in RNAlater0 and subjected to 2 extraction protocols with mechanical lysis: the Powersoil (R) MoBio kit or the International Human Microbiota Standard (IHMS) Protocol Q. Libraries of the 12 samples targeting the V3-V4 16S and the ITS1 regions were prepared using Metabiote (R) (Genoscreen) and sequenced on GS-FLX-454. Sequencing data were analysed using SHAMAN (http://shaman.pasteur.fr/). The bacterial and fungal microbiota were compared in terms of diversity and relative abundance. Results We obtained 171869 and 199089 quality-controlled reads for 16S and ITS, respectively. All 16S reads were assigned to 41 bacterial genera; only 52% of ITS reads were assigned to 40 fungal genera/section. Rarefaction curves were satisfactory in 3/3 and 2/3 subjects for 16S and ITS, respectively. PCoA showed important inter-individual variability of intestinal microbiota largely overweighing the effect of storage or extraction. Storage in RNAlater (R) impacted (downward trend) the relative abundances of 7/41 bacterial and 6/40 fungal taxa, while extraction impacted randomly 18/41 bacterial taxa and 1/40 fungal taxon. Conclusion Our results showed that RNAlater (R) moderately impacts bacterial or fungal community structures, while extraction significantly influences the bacterial composition. For combined bacterial and fungal intestinal microbiota analysis, immediate sample freezing should be preferred when feasible, but storage in RNAlater (R) remains an option under unfavourable conditions or for concomitant metatranscriptomic analysis; and extraction should rely on protocols validated for bacterial analysis, such as IHMS Protocol Q, and including a powerful mechanical lysis, essential for fungal extraction

    Bacteriophages to reduce gut carriage of antibiotic resistant uropathogens with low impact on microbiota composition.

    No full text
    International audienceUropathogenic Escherichia coli (UPEC) is the leading cause of urinary tract infections (UTIs) worldwide, causing over 150 million clinical cases annually. There is currently no specific treatment addressing the asymptomatic carriage in the gut of UPEC before they initiate UTIs. This study investigates the efficacy of virulent bacteriophages to decrease carriage of gut pathogens. Three virulent bacteriophages infecting an antibiotic-resistant UPEC strain were isolated and characterized both in vitro and in vivo. A new experimental murine model of gut carriage of E. coli was elaborated and the impact of virulent bacteriophages on colonization levels and microbiota diversity was assessed. A single dose of a cocktail of the three bacteriophages led to a sharp decrease in E. coli levels throughout the gut. We also observed that microbiota diversity was much less affected by bacteriophages than by antibiotics. Therefore, virulent bacteriophages can efficiently target UPEC strains residing in the gut, with potentially profound public health and economic impacts. These results open a new area with the possibility to manipulate specifically the microbiota using virulent bacteriophages, which could have broad applications in many gut-related disorders/diseases and beyond

    The role of glycosylphosphatidylinositol (gpi) anchored proteins in Cryptococcusneoformans

    No full text
    It is becoming increasingly obvious that glycophosphatidylinositol (GPI)-anchored proteins (GAPs) play a prominent role in fungi, a full understanding of GAPs is however lacking especially for the human opportunistic fungus Cryptococcus neoformans. Using online GPI prediction tools, GAPs were identified and subsequently a mutant library for these GAP-encoding genes was developed and a publicly available knock out (KO) mutant library was used. In total, 41 overexpression and 34 KO mutants, representing 47 unique genes, were analyzed. From the analysis of the two libraries, two main gene candidates, a mannoprotein 88 (MP88) (CNAG_00776) and an uncharacterized protein (CNAG_00137) were further investigated by constructing additional independent mutant strains. The CNAG_00776 mutant showed an impaired growth upon plasma membrane stress and significant decreased phagocytosis. The CNAG_00137 mutant showed impaired growth during cell wall stress or increased temperature and significant decreased phagocytosis. By performing a large genetic screen of GAPs in the genome of the human fungal pathogen C. neoformans, we identified two candidate GAP genes involved in C. neoformans/host interaction and stress response. Further research into these two genes could potentially result in new targets for antfungals, treatment strategies or vaccines to manage C. neoformans disease

    SHAMAN: a user-friendly website for metataxonomic analysis from raw reads to statistical analysis

    No full text
    International audienceComparing the composition of microbial communities among groups of interest (e.g., patients vs healthy individuals) is a central aspect in microbiome research. It typically involves sequencing, data processing, statistical analysis and graphical display. Such an analysis is normally obtained by using a set of different applications that require specific expertise for installation, data processing and in some cases, programming skills

    The role of glycosylphosphatidylinositol (gpi) anchored proteins in Cryptococcus neoformans

    No full text
    International audienceIt is becoming increasingly obvious that glycophosphatidylinositol (GPI)-anchored proteins (GAPs) play a prominent role in fungi, a full understanding of GAPs is however lacking especially for the human opportunistic fungus Cryptococcus neoformans. Using online GPI prediction tools, GAPs were identified and subsequently a mutant library for these GAP-encoding genes was developed and a publicly available knock out (KO) mutant library was used. In total, 41 overexpression and 34 KO mutants, representing 47 unique genes, were analyzed. From the analysis of the two libraries, two main gene candidates, a mannoprotein 88 (MP88) (CNAG_00776) and an uncharacterized protein (CNAG_00137) were further investigated by constructing additional independent mutant strains. The CNAG_00776 mutant showed an impaired growth upon plasma membrane stress and significant decreased phagocytosis. The CNAG_00137 mutant showed impaired growth during cell wall stress or increased temperature as well as decreased phagocytosis compared. By performing a large genetic screen of GAPs in the genome of the human fungal pathogen C. neoformans, we identified two candidate GAP genes involved in C. neoformans/host interaction and stress response. Further research into these two genes could potentially result in new targets for antifungals, treatment strategies or vaccines to manage C. neoformans disease
    corecore